-
Notifications
You must be signed in to change notification settings - Fork 146
fix(proxy): implement endpoint stickiness on retries #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Non-network errors (PROVIDER_ERROR): keep same endpoint on retry - Network errors (SYSTEM_ERROR): switch to next lowest-latency endpoint - Endpoint exhaustion: stay at last endpoint without wrap-around - Scope: single request only, no cross-request stickiness Introduced `currentEndpointIndex` variable to track endpoint independently from `attemptCount`. Only network errors (SYSTEM_ERROR) advance the index; HTTP errors (PROVIDER_ERROR) keep the same endpoint for retry. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @ding113, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the proxy's resilience and efficiency by implementing a more sophisticated retry strategy. It introduces 'endpoint stickiness,' where the system intelligently decides whether to retry on the same endpoint or switch to a new one based on the nature of the error encountered. This optimization aims to improve the success rate of requests by adapting to different failure modes, such as transient network issues versus persistent provider-side problems. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthrough代理转发器引入了端点粘性机制,通过独立跟踪当前端点索引来管理重试中的端点选择。在系统错误时,端点索引会递进到下一个可用端点,而非固定重试同一端点。相应的测试套件已更新以验证新的端点切换语义。 Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly implements endpoint stickiness for retries, distinguishing between network and provider errors. The logic to advance the endpoint index only on SYSTEM_ERROR and to clamp the index to avoid wrap-around is sound. The accompanying tests are thorough and cover the new behaviors well, including scenarios with mixed errors and endpoint exhaustion. The changes improve the proxy's resilience and retry strategy. I have one suggestion to improve the readability of a test mock setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review Summary
This PR implements endpoint stickiness on retries by distinguishing between network errors (SYSTEM_ERROR) and HTTP errors (PROVIDER_ERROR). The implementation is clean and well-tested.
PR Size: S
- Lines changed: 443 (422 additions, 21 deletions)
- Files changed: 2
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 0 | 0 | 0 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 0 | 0 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Adequate
- Code clarity - Good
Analysis
Core Logic: The PR introduces currentEndpointIndex to track endpoint selection independently from attemptCount. This correctly implements the desired behavior:
- SYSTEM_ERROR (network failures) advances to the next endpoint
- PROVIDER_ERROR (HTTP 4xx/5xx) retries on the same endpoint
- No wrap-around when endpoints are exhausted (stays at last endpoint)
Test Coverage: The PR adds 4 new test cases and updates 4 existing tests to properly use SYSTEM_ERROR for endpoint switching scenarios. All tests correctly mock categorizeErrorAsync and use appropriate error types (network errors vs ProxyError).
Error Handling: The implementation correctly calls categorizeErrorAsync to determine error type before deciding whether to advance the endpoint index. The advancement logic is only triggered in the SYSTEM_ERROR branch (line 793-805 in forwarder.ts).
Code Quality: Comments are accurate and explain the stickiness behavior clearly. The Math.min(currentEndpointIndex, endpointCandidates.length - 1) clamp correctly prevents index out of bounds.
Automated review by Claude AI
Summary
Implements endpoint stickiness on retries to improve retry reliability and reduce unnecessary endpoint cycling. This PR differentiates between network errors (which should try a different endpoint) and HTTP errors (which should retry the same endpoint).
Related Issues:
Problem
The previous retry logic used
attemptCountto cycle through endpoints, which caused two issues:This led to suboptimal retry behavior where:
Solution
Introduced
currentEndpointIndexvariable to track endpoint selection independently fromattemptCount, enabling different retry behaviors based on error category:Math.min(currentEndpointIndex, endpointCandidates.length - 1)Changes
Core Changes
currentEndpointIndexvariable to track endpoint selection independently (line 283)currentEndpointIndexinstead ofattemptCount(lines 320-326)Math.min()clamping (line 325)Supporting Changes
SYSTEM_ERROR: should switch to next endpoint on each network error retryPROVIDER_ERROR: should keep same endpoint on non-network error retrySYSTEM_ERROR: should not wrap around when endpoints exhaustedmixed errors: PROVIDER_ERROR should not advance endpoint indexTesting
Automated Tests
Manual Testing Scenarios
Checklist
Description enhanced by Claude AI
🤖 Generated with Claude Code
Greptile Overview
Greptile Summary
This PR implements endpoint stickiness on retries to improve retry reliability and reduce unnecessary endpoint cycling. The implementation introduces a
currentEndpointIndexvariable that tracks endpoint selection independently fromattemptCount, enabling different retry behaviors based on error type.Key Changes:
Math.min(currentEndpointIndex, endpointCandidates.length - 1)Implementation Quality:
The code change is clean and well-documented with inline comments explaining the behavior. The existing tests were properly updated to use SYSTEM_ERROR for endpoint switching scenarios, and the new tests thoroughly validate all edge cases including mixed error scenarios.
Confidence Score: 5/5
Important Files Changed
currentEndpointIndexseparately fromattemptCount, advancing only on SYSTEM_ERROR (network failures)Sequence Diagram
sequenceDiagram participant Client participant ProxyForwarder participant Endpoint1 participant Endpoint2 participant Endpoint3 Note over ProxyForwarder: currentEndpointIndex = 0 Note over ProxyForwarder: attemptCount = 0 Client->>ProxyForwarder: send(request) ProxyForwarder->>Endpoint1: attempt 1 (index=0) Endpoint1-->>ProxyForwarder: SYSTEM_ERROR (network) Note over ProxyForwarder: currentEndpointIndex++ (0→1) Note over ProxyForwarder: attemptCount++ (0→1) Note over ProxyForwarder: wait 100ms ProxyForwarder->>Endpoint2: attempt 2 (index=1) Endpoint2-->>ProxyForwarder: PROVIDER_ERROR (HTTP 500) Note over ProxyForwarder: currentEndpointIndex stays at 1 Note over ProxyForwarder: attemptCount++ (1→2) Note over ProxyForwarder: wait 100ms ProxyForwarder->>Endpoint2: attempt 3 (index=1, sticky) Endpoint2-->>ProxyForwarder: SYSTEM_ERROR (network) Note over ProxyForwarder: currentEndpointIndex++ (1→2) Note over ProxyForwarder: attemptCount++ (2→3) Note over ProxyForwarder: wait 100ms ProxyForwarder->>Endpoint3: attempt 4 (index=2) Endpoint3-->>ProxyForwarder: Success (200 OK) ProxyForwarder-->>Client: Response